Goto

Collaborating Authors

 Đông Hà


OverThink: Slowdown Attacks on Reasoning LLMs

arXiv.org Artificial Intelligence

We increase overhead for applications that rely on reasoning LLMs-we force models to spend an amplified number of reasoning tokens, i.e., "overthink", to respond to the user query while providing contextually correct answers. The adversary performs an OVERTHINK attack by injecting decoy reasoning problems into the public content that is used by the reasoning LLM (e.g., for RAG applications) during inference time. Due to the nature of our decoy problems (e.g., a Markov Decision Process), modified texts do not violate safety guardrails. We evaluated our attack across closed-(OpenAI o1, o1-mini, o3-mini) and open-(DeepSeek R1) weights reasoning models on the FreshQA and SQuAD datasets. Our results show up to 18x slowdown on FreshQA dataset and 46x slowdown on SQuAD dataset. The attack also shows high transferability across models. To protect applications, we discuss and implement defenses leveraging LLM-based and system design approaches. Finally, we discuss societal, financial, and energy impacts of OVERTHINK attack which could amplify the costs for third-party applications operating reasoning models.


Prototype-Based Interpretability for Legal Citation Prediction

arXiv.org Artificial Intelligence

Deep learning has made significant progress in the past decade, and demonstrates potential to solve problems with extensive social impact. In high-stakes decision making areas such as law, experts often require interpretability for automatic systems to be utilized in practical settings. In this work, we attempt to address these requirements applied to the important problem of legal citation prediction (LCP). We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. Additionally, we introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers. Our study builds on and leverages the state-of-the-art language processing models for law, while addressing vital considerations for high-stakes tasks with practical societal impact.


Improved Sensor-Based Animal Behavior Classification Performance through Conditional Generative Adversarial Network

arXiv.org Artificial Intelligence

Many activity classifications segments data into fixed window size for feature extraction and classification. However, animal behaviors have various durations that do not match the predetermined window size. The dense labeling and dense prediction methods address this limitation by predicting labels for every point. Thus, by tracing the starting and ending points, we could know the time location and duration of all occurring activities. Still, the dense prediction could be noisy with misalignments problems. We modified the U-Net and Conditional Generative Adversarial Network (cGAN) with customized loss functions as a training strategy to reduce fragmentation and other misalignments. In cGAN, the discriminator and generator trained against each other like an adversarial competition. The generator produces dense predictions. The discriminator works as a high-level consistency check, in our case, pushing the generator to predict activities with reasonable duration. The model trained with cGAN shows better or comparable performance in the cow, pig, and UCI HAPT dataset. The cGAN-trained modified U-Net improved from 92.17% to 94.66% for the UCI HAPT dataset and from 90.85% to 93.18% for pig data compared to previous dense prediction work.


Sponge Examples: Energy-Latency Attacks on Neural Networks

arXiv.org Machine Learning

The high energy costs of neural network training and inference led to the use of acceleration hardware such as GPUs and TPUs. While this enabled us to train large-scale neural networks in datacenters and deploy them on edge devices, the focus so far is on average-case performance. In this work, we introduce a novel threat vector against neural networks whose energy consumption or decision latency are critical. We show how adversaries can exploit carefully crafted $\boldsymbol{sponge}~\boldsymbol{examples}$, which are inputs designed to maximise energy consumption and latency. We mount two variants of this attack on established vision and language models, increasing energy consumption by a factor of 10 to 200. Our attacks can also be used to delay decisions where a network has critical real-time performance, such as in perception for autonomous vehicles. We demonstrate the portability of our malicious inputs across CPUs and a variety of hardware accelerator chips including GPUs, and an ASIC simulator. We conclude by proposing a defense strategy which mitigates our attack by shifting the analysis of energy consumption in hardware from an average-case to a worst-case perspective.